Soil Genesis and Classification
Vahideh Sadeghizadeh; seyed ali abtahi; Majid Baghernejad; Azam Jafari; Seyed Ali Akbar Moosavi
Abstract
Introduction The number of environmental variables used in digital soil mapping has increased rapidly, which has made it a challenge to select and focus on the most important covariates. No environmental covariates have the same predictability in modeling, and some covariates may introduce noise that ...
Read More
Introduction The number of environmental variables used in digital soil mapping has increased rapidly, which has made it a challenge to select and focus on the most important covariates. No environmental covariates have the same predictability in modeling, and some covariates may introduce noise that reduces the predictive power of the models used. On the other hand, it is beneficial to identify all environmental variables to obtain spatial information that can improve predictions. In this regard, the feature selection algorithms help reduce the dimensions of the predictive model by identifying the associated covariates. Therefore, this study aims to investigate different feature selection algorithms in the selection of auxiliary variables and evaluation their effect on the predictive model. Materials and Methods The area under study is a part of Darab city in the southeast of Fars province with an area of about 31000 hectares. In the study area 140 profiles were determined and excavated according to the diversity of geomorphological units and thus the type of soils. After excavating the profiles and checking the morphological characteristics of each soil profile, a sufficient amount of soil samples were collected from the genetic horizons and transported to the laboratory for further analysis. Some of the physical and chemical parameters of soils were tested using accepted techniques after air drying and passing through a 2 mm sieve. Finally, all profiles up to the great group level were classified using the U.S. Soil Taxonomy based on the data collected from field observations and the outcomes of laboratory analysis. Environmental variables include the parameters derived from the Digital Elevation Model, Landsat 8 images, geology and geomorphology maps of the study area. All parameters were derived using ArcGIS, SAGAGIS and ENVI softwares. In the present study, four different feature selection techniques including Variance Inflation Factor (VIF), Principal Component Analysis (PCA), Boruta and Recursive Feature Elimination (RFE), were used to identify an optimal set of covariates for predicting spatial classification of soil classes at the great group level. In addition, a Random Forest model (RF) with 10-fold cross-validation and the 5-repeat method, was used to compare different feature selection strategies in soil class mapping. The comparison of different feature selection techniques in estimating soil classes, was based on the evaluation criteria of accuracy and Kappa coefficient between observed and predicted values.Results and Discussion The results showed that the prediction accuracy increased by using variables selected with different feature selection methods compared to using all variables in the model. In addition, the improvement in predictive performance is different between the four types of feature selection. The VIF and PCA methods had the highest and lowest accuracy index and Kappa coefficient, respectively. The Boruta method, with the lowest number of variables, improved the model's performance after the VIF method. However, the Kappa coefficient showed poor agreement between predicted and observed values for all approaches. The imbalance of soil classes could be a reason for decreasing the accuracy index and Kappa coefficient. However, the random forest model, with and without feature selection methods, identified all soil great groups in the study area. Therefore, it can be concluded that the Random Forest algorithm is a very powerful technique for spatial prediction of soil classes in the study area. Although the performance of the model varied using different feature selection algorithms, the predicted soil maps had similar spatial patterns. Based on the prediction of model with the variables selected by the VIF, the resulting map indicates that Ustorthents soils are mainly located in high altitude regions with steep slopes. Haplustepts, Calciustepts, and Calciusterts great groups have developed in places with low to medium slopes. Haplosalids have developed downstream of the salt dome. Great groups of Ustifluvents were discovered in fluvial sedimentary plains. Endoaquepts were found in the floodplains, which had the smallest area on the predicted map. Conclusion Overall, the findings indicate that the feature selection methods can utilize significant dependencies among relevant covariates to predict soil classes and to improve modeling accuracy. In the current study, the environmental factors, obtained from the Digital Elevation Model, were selected as key variables, showing the importance of topography and morphology in the classification of soil types in the area. Although the selected variables improved the performance of the model, the prediction of soil classes was random. This could be attributed to the imbalance of soil classes.
Ehsan Ghojehpour; Vahidreza Jalali; Azam Jafari; Majid Mahmoodabadi
Abstract
Introduction Spatial and temporal variations of soil characteristics occur in large and small scales. Investigating the variability of soil parameters is considered as one of the requirements for proper management of fertilizer resources in a sustainable agricultural system. Studying of these variation ...
Read More
Introduction Spatial and temporal variations of soil characteristics occur in large and small scales. Investigating the variability of soil parameters is considered as one of the requirements for proper management of fertilizer resources in a sustainable agricultural system. Studying of these variation is very time-consuming and costly especially in large scales. In order to the fast and reliable determination of the soil properties, various interpolation techniques have been developed and applied. The most widely used interpolation technique is the different Kriging types. The copula function is one of the new interpolation techniques that are recently used in sciences such as hydrology. Thus, the aim of this research was to evaluate the spatial variation of some soil chemical properties using the copula function and comparisons with geostatistics techniques. Materials and Methods Sampling by regular networking was done in an area of 484 ha located in 10 km far from the west of Baft city, located in Kerman province, central Iran (latitude of 29° 15′ N and longitude of 56° 29′ E). In the studied area, three agricultural, pasture and industrial sites are located nearby. The common crops of the region are wheat, barley, alfalfa, legumes and orchards of walnuts, pomegranates, almonds and grapes. The average height of the studied area is 2270 meters above sea level, the average annual temperature of the area is 16 degrees Celsius, and the average annual precipitation of the area is 247 mm. The soil used for the experiment was collected from 0 to 20 cm depth of the field. 121 soil samples were air-dried and, some physical and chemical properties were measured. In order to fit the Copula function to the data, first the appropriate marginal distribution function should be fitted to the data. For this purpose, three tests were used: Kolmogorov-Smirnov, Anderson-Darling and Chi-Square. The mentioned tests were carried out in the EasyFit 5.5 statistical software. By fitting the best marginal distribution function, the cumulative value of the marginal distribution function is calculated for each data. After calculating the above values, detailed functions can be fitted to the data. Finally, the accuracy of each interpolation method was evaluated according to the root mean square Error (RMSE), coefficient of determination (R2), mean absolute error (MAE) and mean biass error (MBE) indices. Results and Discussion In all types of geostatistical methods, the first step in interpolation is to fit the semivaiogram to the measured data, so after normalizing the data and validating the models, the appropriate model was selected for fitting the semivaiogram. Among the measured parameters, Pava and Kava semivaiogram followed spherical model and the interpolation of the above variables was done on the basis of this model. Copula analysis showed that the available phosphorous and potassium variables followed from the Wakeby and gamma distribution function, respectively. Also, based on the Pearson correlation coefficient, the correlation between pairs of points was less than 2000 m and the distance more than 2000 m was known as an independent distance. Based on the validation criteria for Pava parameter, Median copula function, Average copula function, IDW, Ordinary Kriging, Disjunctive Kriging, Universal Kriging and Simple Kriging have better estimates, respectively, and in the same way, the best interpolator for Kava parameter Median copula function, Average copula function, Ordinary Kriging, Universal Kriging, Disjunctive Kriging, Simple Kriging and IDW were determined, respectively. The estimation performance based on the coefficient of determination (R2) showed that value of this coefficient for copula function for available phosphorous and potassium were 5% and 4% greater than conventional geostatistics techniques. Also, the error of estimation was less for copula function indicating the better performance of copula to estimate the mentioned soil propertiesConclusion This study was performed to investigate the Feasibility study of Copula function in predicting some soil nutrients and comprising this method with widely used methods of geostatistics. Our results demonstrated that the copula function method is more capable than the classical geostatistical methods in estimating soil properties due to the non-dependence of this method on the normality of the data distribution and outlier data. Therefore, with the help of this method, having a reliable and high-quality data bank of soil characteristics, acceptable maps of other soil characteristics can be presented at various scales.
Soil Genesis and Classification
Farideh Abbaszadeh Afshar
Abstract
Introduction Mapping the spatial distribution of soil taxonomic classes is important for informing soil use and management decisions. Digital soil mapping (DSM) can quantitatively predict the spatial distribution of soil taxonomic classes. DSM is the computer-assisted production of digital maps of soil ...
Read More
Introduction Mapping the spatial distribution of soil taxonomic classes is important for informing soil use and management decisions. Digital soil mapping (DSM) can quantitatively predict the spatial distribution of soil taxonomic classes. DSM is the computer-assisted production of digital maps of soil type and soil properties. It typically implies use of mathematical and statistical models that combine information from soil observations with information contained in correlated variables and remote sensing images. Machine learning is a general term for a broad set of models used to discover patterns in data and to make predictions. Although machine learning is most often applied to large databases, it is an attractive tool for learning about and making spatial predictions of soil classes because knowledge about relationships between soil classes and environmental covariates is often poorly understood. Our objective was to compare multiple machine learning models (multinomial regression logistic, boosted regression trees and decision tree) for predicting soil great groups at Bam distinct in Kerman province. Materials and Methods The study area, Bam district was located between 58°4΄17˝ to 58°28΄8˝ E longitudes and 28°52΄51˝ to 29°9΄29˝ N latitudes (Fig. 1), at Kerman province, (Southeastern Iran). The area is surrounded by mountains (dominantly limestone and volcanic) from northwest toward southeast with major landforms included young alluvial fans and pediment, clay flat and hills. The mean annual precipitation, temperature and potential evapotranspiration are respectively 64 mm, 23.8◦C and 3000 mm with Aridic and Hyper thermic soil moisture and temperate regimes Stratified sampling scheme were defined in 100000 hectares, and 126 soil profiles were excavated and described by Key of soil taxonomy. Our objective was to perform and compare multiple machine learning models for predicting soil taxonomic classes (great group level). The models were used in this study including, multinomial logistic regression (MLR), boosted regression trees (BRT) and decision tree (DT). We used 80/20 training/testing split (80% of the pedon observations were used for model training and 20% for model testing). Kappa index (KI), overall accuracy (OC), Brier scores (BS), User accuracy (UA) and producer accuracy (PA) were used to compare model accuracy. Results and Discussion The profile description revealed the presence of two soil orders: Entisols and Aridisols that, subdivided in six suborders and eight great groups: Haplosalids, Haplocambids, Haplocalcids, Haplogypsids, Calcigypsids, Calciargids, Petrocalcids and Torriorthents. This testifies to the wide pedodiversity of the study area, considering that is characterized by the presence of eight soils great groups. Results showed that the geomorphology map contributed importantly to the prediction accuracy. This can be explained by the fact that the geomorphological surfaces have formed recently, or during a geological period with soil formation under conditions close to those of current processes in the arid regions. Terrain attributes and finally remote sensing indices after geomorphic surface were imported as predictors in the prediction. The best prediction result was obtained when characteristics derived from terrain, remote sensing and geomorphological processes were used together and when differentiation of geomorphological processes and overall heterogeneity identification and stratification of the study area was made. In areas where the distribution of predictors was more homogenous, the models can better understand and connect predictors and response. The spatial distribution of soils in the study area followed the distribution pattern of most geomorphological and terrain attributes. The results of model comparing indicated that decision tree was consistently the most accurate. The results of prediction accuracy of soil groups showed that the highest accuracy related Haplosalids, Calcigypsids and Petrocalcids soil great groups. The lowest of predictive quality was observed for Haplocalcids in three approaches. As a reliable and flexible approach, decision tree could be used successfully to prepare continuous digital soil maps. Conclusion The application of decision trees for prediction of soil types could be a promising alternative. In digital soil mapping, the best prediction result was obtained when parameters derived from terrain, remote sensing and geomorphological processes were used together and when differentiation of geomorphological processes and overall heterogeneity identification and stratification of the study area was made. In areas where the distribution of predictors was more homogenous, the models can better understand and connect predictors and response. Altogether, an extended digital terrain analysis approach and clear description of geomorphological, geological and pedological processes could be a promising key technology in future soil mapping.
Soil Chemistry and Pollution
M Ayeneh Heydari; M Hejazi Mehrizi; A Jafari; M Yousefifard
H. Shamsaldin; V. Jalali; A. jafari
M. Moeini; M. Hejazi Mehrizi; A. Jafari
R Taghizadeh-Mehrjardi1; F Sarmadian; A. A Zolfaghari; A. Jafari
Abstract
Introduction: Cation exchange capacity (CEC) has long been input parameter of many environmental models (Manrique et al., 1991). Added to this, CEC data can give more clear and complete interpretation of soil, plant nutrition process and consequently fertilizer and soil amendment requirements. Laboratory ...
Read More
Introduction: Cation exchange capacity (CEC) has long been input parameter of many environmental models (Manrique et al., 1991). Added to this, CEC data can give more clear and complete interpretation of soil, plant nutrition process and consequently fertilizer and soil amendment requirements. Laboratory analysis is the most accurate method for direct measurement of CEC. However, direct measurement of CEC is difficult, particularly in the soils of arid and semi-arid regions of Iran, due to large amounts of calcium carbonate that makes measuring expensive, laborious, and time-consuming (Amini et al., 2005). It can be an appropriate approach to predict CEC from readily available properties via developing nonparametric or parametric methods (Minasny et al., 1999). Therefore, the objectives of this study were to compare and apply different data mining approches including multi-linear regression (MLR), multi-nonlinear regression (MNR), cascade neural network (CNN), two radial base functions (RBF), multi-layer perceptron neural network (MLP), and adaptive neuro-fuzzy inference system (ANFIS) to estimate cation exchange capacity in different soils of Iran. Materials and Methods: For this purpose, 1770 soil samples were selected from different sites in Iran from which 356 samples were used as the testing data, and the remaining 1414 soils were employed as the training. The soil samples were dried, crushed and passed through a 2 mm sieve to prepare for physical and chemical analyses. The percentages of sand (50 -2000 mμ), silt (2-50 mμ) and clay (<2μm) were determined using the hydrometer method according to USDA soil textural classification system. The soil organic carbon was determined using Walkly-Black method and the CEC was measured by the standard method. Then the data mining techniques (i.e. MLR, MNR, CNN, RBF, MLP, ANFIS) were applied to predict CEC from readily available data (i.e. soil organic carbon and clay percentages). Finally, to compare efficiencies of these techniques, different error criteria including root mean square error (RMSE), mean error (ME), coefficient of determination (R2) and relative improvement (RI) were applied. In the present research, an effort was made to calculate the uncertainty of pedotransfer functions using Monte Carlo technique. Results and Discussion: Statistical analyses indicated the soil organic matter and soil texture have the highest variation. For example, variation of SOM has ranged from 0.01 to 2.94. Investigation of correlation coefficients shows that CEC is more related to the parameters, clay and soil organic matter content. Thus, the parameters, clay, silt, sand and organic carbon content were the input independent variables (readily available properties), and the CEC was an output dependent variable in this study. Root mean square error (RMSE) of linear and nonlinear regression was 4.74 and 4.71 meq 100g-1, respectively. This indicates that both methods are able to properly and equally predict CEC. Nonlinear recession equation increased the accuracy of prediction by 0.6 %. Results show that nonparametric artificial neural networks do not increase the accuracy of prediction CEC, significantly. The best result of neural networks was obtained using MLP. Nonparametric regression tree accuracy was slightly better than artificial neural network methods (4.53 and 4.61 meq 100g-1, respectively). The best method for prediction of CEC was ANFIS (RMSE=4.02 meq 100g-1). The accuracy of prediction using this method was 15 % more than linear regression. Moreover, the ANFIS model on the partitioned data by fuzzy k-means cloud enhances the prediction accuracy up to 26%. Monte Carlo results indicate the highest and lowest uncertainty belongs to MLR and ANFIS models, respectively. Conclusion: In the present research, different data mining techniques were applied to predict CEC in various ranges of soils. The data base related to 1770 soil samples was gathered from all over Iran. Results of the comparison indicate the highest prediction accuracy belongs to ANFIS model. Moreover, partitioning the data base to four groups enhances the accuracy of models. This result confirms that pedotransfer functions are more reliable only on the range of existing data. Overall, our efforts resulted only in R2 of 0.58. This means that soil organic matter and clay percentage could only model the 58% CEC variation. This suggests we should incorporate more input data including kind of clay mineral, percentage of calcium carbonate, gypsum, and etc.